[SC 12707] Add G-eval test in lib #434

AnilSorathiya · 2025-10-10T16:32:27Z

Pull Request Description

What and why?

This add support of Deepeval GEval llm evaluation metrics so that user can define their own criteria for evaluation that can be logged in the documentation using the ValidMind tests.

Geval scorer
demo notebook

How to test

Run the notebook notebooks/code_sharing/geval_deepeval_integration_demo.ipynb

What needs special review?

Dependencies, breaking changes, and deployment notes

Release notes

Checklist

…gentic-model-in-vm-library

…-in-the-init-model-to

…nd-statistical-tests

…val-dataset-llmtestcase

validmind/scorer/llm/deepeval/GEval.py

notebooks/code_sharing/geval_deepeval_integration_demo.ipynb

validmind/datasets/llm/agent_dataset.py

juanmleng

LGTM!

github-actions · 2025-10-23T17:08:34Z

PR Summary

This PR implements multiple improvements across the project:

A new notebook (geval_deepeval_integration_demo.ipynb) has been added to showcase the integration between DeepEval's GEval metric and the ValidMind library. The notebook covers installation instructions, setting up the environment, creating test cases, assigning scores, and even plotting the results with customized box plots.
Several dependency version updates have been applied in the poetry.lock and pyproject.toml files. Notable changes include downgrading or constraining versions of packages such as numpy, blis, datasets, fsspec, pyarrow, and related extras. This helps to ensure compatibility with supported Python versions and may resolve potential issues with newer releases.
The LLM dataset conversion function in validmind/datasets/llm/agent_dataset.py was refactored. The logic to convert test cases and goldens to a DataFrame has been divided into helper methods (_process_test_cases, _process_goldens) with a new helper (_add_optional_fields) to streamline processing optional fields. A safeguard has been added to return an empty row if no data is present.
In the plotting module (validmind/tests/plots/BoxPlot.py), minor modifications were made to reference the private dataframe attribute (_df) and adjust plot dimensions and spacing to improve visualization clarity.
A new scorer module for GEval (validmind/scorer/llm/deepeval/GEval.py) has been introduced, which integrates DeepEval’s functionality into the ValidMind testing framework. This scorer sets up evaluation using provided criteria, rubric, and evaluation steps, processes each test case from a dataset, and returns scores along with detailed explanations.

Test Suggestions

Run the new GEval integration notebook to verify that test cases are scored correctly and that the resulting evaluations (score, reason, criteria) match expectations.
Validate that the agent dataset conversion handles both complete and missing optional fields, ensuring that empty rows are generated if necessary.
Execute the plotting tests (BoxPlot) to confirm that the updated plot dimensions and spacing yield clear and readable visual outputs.
Perform dependency and compatibility tests to verify that the new version constraints (e.g., numpy, pyarrow, fsspec) work as intended in the supported Python versions.

AnilSorathiya added 30 commits June 24, 2025 11:18

support agent use case

1b3f67a

wrapper function for agent

723fcab

ragas metrics

28d9fbb

update ragas metrics

ecf8e09

fix lint error

53e8879

create helper functions

1662368

Merge branch 'main' into anilsorathiya/sc-10863/add-support-for-llm-a…

cc84cbc

…gentic-model-in-vm-library

delete old notebook

6f09780

update description for each section

0bb731e

simplify agent

e758979

simple demo notebook using langchain agent

7c35cfe

Update description of the simplified langgraph agent demo notebook

9bb70e9

add brief description to tests

894d52a

add brief description to tests

d86a9af

Allow dict return type predict_fn

884000f

update notebook and refactor utils

fbd5aa9

lint fix

daceabf

Merge branch 'main' into anilsorathiya/sc-11324/extend-the-predict-fn…

5f8823a

…-in-the-init-model-to

fix the test failure

70a5636

new unit tests for multiple columns return in assign_predictions

33b06fb

update notebooks to return multiple values in predict_fn

8e12bd2

general plotting and stats tests

e38929d

clear output

e900a65

Merge branch 'main' into anilsorathiya/sc-11380/add-generlize-plots-a…

a08e881

…nd-statistical-tests

remove duplicate tests

16f4700

update notebook

bb9f9af

Integration between deepeval and validmind

5078a7a

Merge branch 'main' into anilsorathiya/sc-11452/support-for-the-deepe…

2eb6abb

…val-dataset-llmtestcase

add MetricValues class for metric return type

ad0b719

Return MetricValues in the unit tests

94ca006

juanmleng reviewed Oct 21, 2025

View reviewed changes

validmind/scorer/llm/deepeval/GEval.py Outdated Show resolved Hide resolved

juanmleng reviewed Oct 21, 2025

View reviewed changes

validmind/scorer/llm/deepeval/GEval.py Outdated Show resolved Hide resolved

juanmleng reviewed Oct 21, 2025

View reviewed changes

notebooks/code_sharing/geval_deepeval_integration_demo.ipynb Outdated Show resolved Hide resolved

juanmleng reviewed Oct 21, 2025

View reviewed changes

notebooks/code_sharing/geval_deepeval_integration_demo.ipynb Show resolved Hide resolved

juanmleng reviewed Oct 21, 2025

View reviewed changes

notebooks/code_sharing/geval_deepeval_integration_demo.ipynb Outdated Show resolved Hide resolved

AnilSorathiya added 5 commits October 21, 2025 22:12

update Geval

d5ee5b8

update agent dataset object

ca51c49

update geval

2193329

add reason in the geval

27c1638

passing evaluation_params to select columns for geval

786da89

juanmleng reviewed Oct 22, 2025

View reviewed changes

notebooks/code_sharing/geval_deepeval_integration_demo.ipynb Outdated Show resolved Hide resolved

juanmleng reviewed Oct 22, 2025

View reviewed changes

notebooks/code_sharing/geval_deepeval_integration_demo.ipynb Outdated Show resolved Hide resolved

juanmleng reviewed Oct 22, 2025

View reviewed changes

notebooks/code_sharing/geval_deepeval_integration_demo.ipynb Outdated Show resolved Hide resolved

juanmleng reviewed Oct 23, 2025

View reviewed changes

notebooks/code_sharing/geval_deepeval_integration_demo.ipynb Outdated Show resolved Hide resolved

juanmleng reviewed Oct 23, 2025

View reviewed changes

notebooks/code_sharing/geval_deepeval_integration_demo.ipynb Show resolved Hide resolved

juanmleng reviewed Oct 23, 2025

View reviewed changes

notebooks/code_sharing/geval_deepeval_integration_demo.ipynb Show resolved Hide resolved

juanmleng reviewed Oct 23, 2025

View reviewed changes

notebooks/code_sharing/geval_deepeval_integration_demo.ipynb Outdated Show resolved Hide resolved

juanmleng reviewed Oct 23, 2025

View reviewed changes

notebooks/code_sharing/geval_deepeval_integration_demo.ipynb Outdated Show resolved Hide resolved

juanmleng reviewed Oct 23, 2025

View reviewed changes

notebooks/code_sharing/geval_deepeval_integration_demo.ipynb Outdated Show resolved Hide resolved

juanmleng reviewed Oct 23, 2025

View reviewed changes

validmind/datasets/llm/agent_dataset.py Outdated Show resolved Hide resolved

AnilSorathiya added 2 commits October 23, 2025 11:36

change criteria

f6ba7e7

Merge branch 'main' into anilsorathiya/sc-12707/add-g-eval-test-in-lib

5e198ed

juanmleng approved these changes Oct 23, 2025

View reviewed changes

AnilSorathiya added 2 commits October 23, 2025 15:01

update markup

1254c7b

2.10.2

58f3136

AnilSorathiya requested a review from johnwalz97 October 24, 2025 09:34

johnwalz97 approved these changes Oct 28, 2025

View reviewed changes

AnilSorathiya merged commit 10e4bdc into main Oct 28, 2025
17 checks passed

AnilSorathiya deleted the anilsorathiya/sc-12707/add-g-eval-test-in-lib branch October 28, 2025 17:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SC 12707] Add G-eval test in lib #434

[SC 12707] Add G-eval test in lib #434

Uh oh!

AnilSorathiya commented Oct 10, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

juanmleng left a comment

Uh oh!

github-actions bot commented Oct 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SC 12707] Add G-eval test in lib #434

[SC 12707] Add G-eval test in lib #434

Uh oh!

Conversation

AnilSorathiya commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Description

What and why?

How to test

What needs special review?

Dependencies, breaking changes, and deployment notes

Release notes

Checklist

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

juanmleng left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Oct 23, 2025

PR Summary

Test Suggestions

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

AnilSorathiya commented Oct 10, 2025 •

edited

Loading